Skip to content

Add support for FixedSizeList to variant_to_arrow#9663

Open
rishvin wants to merge 11 commits intoapache:mainfrom
rishvin:fixed_size_list_variant_to_arrow
Open

Add support for FixedSizeList to variant_to_arrow#9663
rishvin wants to merge 11 commits intoapache:mainfrom
rishvin:fixed_size_list_variant_to_arrow

Conversation

@rishvin
Copy link
Copy Markdown

@rishvin rishvin commented Apr 6, 2026

Rationale for this change

Add support for FixedSizeList when invoking variant_to_arrow.

What changes are included in this PR?

  • Introduces a new builder VariantToFixedSizeListArrowRowBuilder.
  • Adds test cases for shredding and getting variant by FixedSizeList.

Are these changes tested?

By adding few test cases.

Are there any user-facing changes?

N/A.

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Apr 6, 2026
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 6, 2026

@sdf-jkl could you help review this PR?

Copy link
Copy Markdown
Contributor

@sdf-jkl sdf-jkl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rishvin, I left some suggestions.

Comment thread parquet-variant-compute/src/variant_to_arrow.rs Outdated
Comment thread parquet-variant-compute/src/variant_to_arrow.rs Outdated
Comment thread parquet-variant-compute/src/variant_to_arrow.rs
Comment thread parquet-variant-compute/src/variant_get.rs Outdated
Comment thread parquet-variant-compute/src/variant_get.rs Outdated
Comment thread parquet-variant-compute/src/shred_variant.rs
@rishvin rishvin requested a review from sdf-jkl April 11, 2026 21:24
@rishvin
Copy link
Copy Markdown
Author

rishvin commented Apr 11, 2026

Thanks @sdf-jkl for reviewing changes. I have addressed the comments, could you re-review ?

Copy link
Copy Markdown
Contributor

@sdf-jkl sdf-jkl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this @rishvin! Everything looks great, I left some nits.

}

impl<'a> ListElementBuilder<'a> {
fn append_null(&mut self) -> Result<()> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment mentioning this is only used for FixedSizeList builder?

Comment thread parquet-variant-compute/src/variant_get.rs Outdated
Comment thread parquet-variant-compute/src/variant_get.rs Outdated
Comment thread parquet-variant-compute/src/shred_variant.rs Outdated
Comment thread parquet-variant-compute/src/shred_variant.rs Outdated
Comment thread parquet-variant-compute/src/shred_variant.rs Outdated
Comment thread parquet-variant-compute/src/shred_variant.rs
rishvin and others added 2 commits April 13, 2026 11:53
Co-authored-by: Konstantin Tarasov <33369833+sdf-jkl@users.noreply.github.com>
@rishvin rishvin requested a review from sdf-jkl April 13, 2026 19:24
@rishvin
Copy link
Copy Markdown
Author

rishvin commented Apr 13, 2026

Thanks @sdf-jkl addressed new comments.

Copy link
Copy Markdown
Contributor

@sdf-jkl sdf-jkl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny change.

Comment thread parquet-variant-compute/src/shred_variant.rs Outdated
Co-authored-by: Konstantin Tarasov <33369833+sdf-jkl@users.noreply.github.com>
@rishvin rishvin requested a review from sdf-jkl April 14, 2026 04:25
@rishvin
Copy link
Copy Markdown
Author

rishvin commented Apr 14, 2026

Thanks again @sdf-jkl ! Addressed it.

Copy link
Copy Markdown
Contributor

@sdf-jkl sdf-jkl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rishvin! LGTM.

@scovich could you please check this when available? Thanks!

Comment on lines +334 to +336
// With `safe` cast option set to false, appending list of wrong size to
// `typed_value_builder` of type `FixedSizeList` will result in an error. In such a
// case, the provided list should be appended to the `value_builder.
Copy link
Copy Markdown
Contributor

@scovich scovich Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the variant shredding spec allows for shredding as a fixed size list, if the resulting layout differs physically from a normal list?

Arrays can be shredded by using a 3-level Parquet list for typed_value.

If the value is not an array, typed_value must be null. If the value is an array, value must be null.

It looks to me like any attempt to shred as fixed-sized list must either succeed (if the size is correct) or hard-fail (because value as fallback is not allowed).

Copy link
Copy Markdown
Contributor

@sdf-jkl sdf-jkl Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It differs physically on the Arrow side, but once we write it to Parquet it'd be same as other ListLikeArrays. But this leads to further discussion on adding FixedSizeList support for VariantArray as well as implementing other types, currently not supported in spec.

We're keeping value because we consider this a cast from Variant to FixedSizeList. The extra len check is there because there is no Variant::FixedSizeList enum to match to. If len is incorrect we consider the cast failed and proceed following the safe cast option as if typed_value is Null.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When casting from variant to arrow, we can do whatever we want.

But this code here is about going from binary variant to shredded variant. And the variant shredding spec directly forbids value to contain a variant array, when shredding as array.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I think the core issue is that Parquet currently has only one logical LIST type. If Parquet had a dedicated logical type for FixedSizeList, the spec wording could be more explicit.

Btw, there’s ongoing work on this too: apache/parquet-format#241 (recently revived).

Given the current spec text:

Arrays can be shredded by using a 3-level Parquet list for typed_value.

If the value is not an array, typed_value must be null. If the value is an array, value must be null.

I read “array” as "a value matching the specific list shape we’re shredding into". For List/LargeList/ListView it's List values, for FixedSizeList array it's a FixedSizeList value.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, the variant spec neither knows nor cares about the intricacies of arrow array types (it also doesn't care about spark or SQL). If we're shredding to a 3-level parquet list, and we encounter a variant array value, the resulting value column entry must be null.

Copy link
Copy Markdown

@cashmand cashmand Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I worked on the shredding spec, and the intent of that line of the spec was to apply to any array, not just one that perfectly matches the shredding schema. For example, in a query with try_cast(v as array<variant>), an engine would be entitled to only fetch the typed_value column from parquet, and produce null for all of the rows where typed_value is null. This would break if value could contain arrays.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for the clarification @cashmand, @scovich!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] Add variant_to_arrow FixedSizeList type support

5 participants